Improving Performance by Re-Rating in the Dynamic Estimation of Rater Reliability
نویسندگان
چکیده
Nowadays crowdsourcing is widely used in supervised machine learning to facilitate the collection of ratings for unlabelled training sets. In order to get good quality results it is worth rejecting results from noisy/unreliable raters, as soon as they are discovered. Many techniques for filtering unreliable raters rely on the presentation of training instances to the raters identified as most accurate to date. Early in the process, the true rater reliabilities are not known and unreliable raters may be used as a result. This paper explores improving the quality of ratings for training instances by performing re-rating. The re-rating relies on the detection of such instances and the acquisition of additional ratings for them when the rating process is over. We compare different approaches to re-rating and compare the improvements in labeling accuracy and the labeling costs of these approaches.
منابع مشابه
Test-re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females.
Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out t...
متن کاملA Study of Raters’ Behavior in Scoring L2 Speaking Performance: Using Rater Discussion as a Training Tool
The studies conducted so far on the effectiveness of resolution methods including the discussion method in resolving discrepancies in rating have yielded mixed results. What is left unnoticed in the literature is the potential of discussion to be used as a training tool rather than a resolution method. The present study addresses this research gap by exploring the data coming from rating behavi...
متن کاملTowards a Task-Based Assessment of Professional Competencies
Performance assessment is exceedingly considered a key concept in teacher education programs worldwide. Accordingly, in Iran, a national assessment system was proposed by Farhangian University to assess the professional competencies of its ELT graduates. The concerns regarding the validity and authenticity of traditional measures of teachers' competencies have motivated us to devise a localized...
متن کاملEnglish and Non English major Teachers’ Assessment of Oral Proficiency: a case of Iranian Maritime English Learners
Speaking assessment is still construed as a complicated, under-researched process from the vantage point of tasks and rater characteristics. The present study aimed at investigating if and how English Major and none English Major teachers differ in their perception of the construct of oral proficiency while assessing learners’ L2 oral proficiency. To this end, 38 male and female non-native EFL...
متن کاملImproving the velocity tracking of cruise control system by using adaptive methods
Accurate and correct performance of controller in cruise control systems is important. Hence, in such systems, controller should optimize itself against noise and probable changes in system dynamic. As a matter of fact, in this article three approaches have been conducted to-ward this purpose: MIT, direct estimation and indirect estimation. These approaches are used as controllers to track refe...
متن کامل